Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Dataset._construct_dataarray #4744

Merged
merged 2 commits into from
Jan 5, 2021

Conversation

dcherian
Copy link
Contributor

  • Tests added
  • Passes isort . && black . && mypy . && flake8
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

Significantly speeds up _construct_dataarray by iterating over ._coord_names instead of .coords. This avoids unnecessarily constructing a DatasetCoordinates object and massively speeds up repr construction for datasets with large numbers of variables.

Construct a 2000 variable dataset

import numpy as np
import xarray as xr

a = np.arange(0, 2000)
b = np.core.defchararray.add("long_variable_name", a.astype(str))
coords = dict(time=np.array([0, 1]))
data_vars = dict()
for v in b:
    data_vars[v] = xr.DataArray(
        name=v,
        data=np.array([3, 4]),
        dims=["time"],
        coords=coords
    )
ds0 = xr.Dataset(data_vars)

Before:

%timeit ds0['long_variable_name1999']
%timeit ds0.__repr__()

1.33 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.66 s ± 52.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After:

%timeit ds0['long_variable_name1999']
%timeit ds0.__repr__()

10.5 µs ± 203 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
84.2 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

xarray/core/dataset.py Outdated Show resolved Hide resolved
@dcherian
Copy link
Contributor Author

dcherian commented Jan 5, 2021

Merging. This logic could be updated if we convert _coord_names to a list or OrderedSet.

@dcherian dcherian merged commit 9fefefb into pydata:master Jan 5, 2021
@dcherian dcherian deleted the perf/_construct_dataarray branch January 5, 2021 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants